19 research outputs found
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Optimizing Credit Limit Adjustments Under Adversarial Goals Using Reinforcement Learning
Reinforcement learning has been explored for many problems, from video games
with deterministic environments to portfolio and operations management in which
scenarios are stochastic; however, there have been few attempts to test these
methods in banking problems. In this study, we sought to find and automatize an
optimal credit card limit adjustment policy by employing reinforcement learning
techniques. In particular, because of the historical data available, we
considered two possible actions per customer, namely increasing or maintaining
an individual's current credit limit. To find this policy, we first formulated
this decision-making question as an optimization problem in which the expected
profit was maximized; therefore, we balanced two adversarial goals: maximizing
the portfolio's revenue and minimizing the portfolio's provisions. Second,
given the particularities of our problem, we used an offline learning strategy
to simulate the impact of the action based on historical data from a super-app
(i.e., a mobile application that offers various services from goods deliveries
to financial products) in Latin America to train our reinforcement learning
agent. Our results show that a Double Q-learning agent with optimized
hyperparameters can outperform other strategies and generate a non-trivial
optimal policy reflecting the complex nature of this decision. Our research not
only establishes a conceptual structure for applying reinforcement learning
framework to credit limit adjustment, presenting an objective technique to make
these decisions primarily based on data-driven methods rather than relying only
on expert-driven systems but also provides insights into the effect of
alternative data usage for determining these modifications.Comment: 29 pages, 16 figure
Proactive Detractor Detection Framework Based on Message-Wise Sentiment Analysis Over Customer Support Interactions
In this work, we propose a framework relying solely on chat-based customer
support (CS) interactions for predicting the recommendation decision of
individual users. For our case study, we analyzed a total number of 16.4k users
and 48.7k customer support conversations within the financial vertical of a
large e-commerce company in Latin America. Consequently, our main contributions
and objectives are to use Natural Language Processing (NLP) to assess and
predict the recommendation behavior where, in addition to using static
sentiment analysis, we exploit the predictive power of each user's sentiment
dynamics. Our results show that, with respective feature interpretability, it
is possible to predict the likelihood of a user to recommend a product or
service, based solely on the message-wise sentiment evolution of their CS
conversations in a fully automated way.Comment: 10 pages, 4 figures, 1 table. Already accepted at NeurIPS 2022,
LatinX in AI Worksho
Example-Dependent Cost-Sensitive Classification with Applications in Financial Risk Modeling and Marketing Analytics
Several real-world binary classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard binary classification methods do not take these costs into account, and assume a constant cost of misclassification errors. This approach is not realistic in many real-world applications. For example in credit card fraud detection, failing to detect a fraudulent transaction may have an economical impact from a few to thousands of Euros, depending on the particular transaction and card holder. In churn modeling, a model is used for predicting which customers are more likely to abandon a service provider. In this context, failing to identify a profitable or unprofitable churner has a significant different economic result. Similarly, in direct marketing, wrongly predicting that a customer will not accept an offer when in fact he will, may have different financial impact, as not all customers generate the same profit. Lastly, in credit scoring, accepting loans from bad customers does not have the same economical loss, since customers have different credit lines, therefore, different profit. Accordingly, the goal of this thesis is to provide an in-depth analysis of example-dependent cost-sensitive classification. We analyze four real-world classification problems, namely, credit card fraud detection, credit scoring, churn modeling and direct marketing. For each problem, we propose an example-dependent cost-sensitive evaluation measure. We propose four example-dependent cost-sensitive methods; the first method is a cost-sensitive Bayes minimum risk classifier which consists in quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions. Second, we propose a cost-sensitive logistic regression technique. This algorithm is based on a new logistic regression cost function; one that takes into account the real costs due to misclassification and correct classification. Subsequently, we propose a cost-sensitive decision trees algorithm which is based on incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Lastly, we define an example-dependent cost-sensitive framework for ensembles of decision-trees. It is based on training example-dependent cost-sensitive decision trees using four different random inducer methods and then blending them using three different combination approaches. Moreover, we present the library CostCla developed as part of the thesis. This library is an open-source implementation of all the algorithms covered in this manuscript. Finally, the experimental results show the importance of using the real example-dependent financial costs associated with real-world applications. We found that there are significant differences in the results when evaluating a model using a traditional cost-insensitive measure such as accuracy or F1Score, than when using the financial savings. Moreover, the results show that the proposed algorithms have better results for all databases, in the sense of higher savings
Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring
Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. Credit scoring is a typical example of cost-sensitive classification. However, it is usually treated using methods that do not take into account
the real financial costs associated with the lending business. In this paper, we propose a new example-dependent cost matrix for credit scoring. Furthermore, we propose an algorithm that introduces the example-dependent costs into a logistic regression. Using two publicly available datasets, we compare our proposed method against state-of-the-art example-dependent cost-sensitive algorithms. The results highlight the importance of using real financial costs. Moreover, by using the proposed cost-sensitive logistic regression, significant improvements are made in the sense of higher savings
A novel cost-sensitive framework for customer churn predictive modeling
Customer churn predictive modeling deals with predicting the probability of a customer defecting using historical, behavioral and socio-economical information. This tool is of great benefit to subscription based companies allowing them to maximize the results of retention campaigns. The problem of churn predictive modeling has been widely studied by the data mining and machine learning communities. It is usually tackled by using classification algorithms in order to learn the different patterns of both the churners and non-churners. Nevertheless, current state-of-the-art classification algorithms are not well aligned with commercial goals, in the sense that, the models miss to include the real financial costs and benefits during the training and evaluation phases. In the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive power, does not yield to the best results when measured by the actual financial cost, ie. investment per subscriber on a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this paper, we present a new cost-sensitive framework for customer churn predictive modeling. First we propose a new financial based measure for evaluating the effectiveness of a churn campaign taking into account the available portfolio of offers, their individual financial cost and probability of offer acceptance depending on the customer profile. Then, using a real-world churn dataset we compare different cost-insensitive and cost-sensitive classification algorithms and measure their effectiveness based on their predictive power and also the cost optimization. The results show that using a cost-sensitive approach yields to an increase in cost savings of up to 26.4
Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. State-of-the-art example-dependent cost-sensitive techniques only introduce the cost to the algorithm, either before or after training, therefore, leaving opportunities to investigate the potential impact of algorithms that take into account the real financial example-dependent costs during an algorithm training. In this paper, we propose an example-dependent cost-sensitive decision tree algorithm, by incorporating the different example-dependent costs into a new cost-based impurity measure and a new cost-based pruning criteria. Then, using three different databases, from three real-world applications: credit card fraud detection, credit scoring and direct marketing, we evaluate the proposed method. The results show that the proposed algorithm is the best performing method for all databases. Furthermore, when compared against a standard decision tree, our method builds significantly smaller trees in only a fifth of the time, while having a superior performance measured by cost savings, leading to a method that not only has more business-oriented results, but also a method that creates simpler models that are easier to analyze